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1. INTRODUCTION 

Super-resolution (SR) is the process of recovering a high-resolution (HR) image from one or more 
low-resolution (LR) input images. Many areas like satellite imaging, high-definition television (HDTV), 
microscopy, traffic surveillance, military, security monitoring, medical diagnosis, and remote sensing imaging 
require good quality images for accurate analysis. Known variables in LR images are less than the unknown 
variables in HR images. Generally, sufficient number of LR images will not be available. Also, blurring 
operators are unknown. Hence, SR reconstruction becomes ill-posed problem. Many regularization techniques 
are discussed for the solution of ill-posed problem [1], [2]. 

The present work aims to recover the SR version of an image from a LR image. In conventional 
dictionary learning, one dictionary is used to train LR image patches, and another dictionary is used to train 
HR image patches. HR image is recovered using sparse representation. In this approach, it is difficult to 
completely recover high-frequency details due to the limitation of size of the dictionary. To overcome the 
above problem, high frequency to be recovered can be considered as a combination of main high frequency 
(MHF) and residual high frequency (RHF). 

The proposed method comprises of dual dictionary learning levels. It is a two-layer algorithm. High 
frequency details are estimated by step-by-step procedure using distinct dictionaries. Primarily, MHF is first 
recovered from main dictionary learning which reduces the gap of the frequency spectrum. Afterwards, RHF is 
reconstructed from residual dictionary learning which results in shorter gap of the frequency spectrum. The 
method is analogous to coarse to fine recovery and yields better results. Orthogonal matching pursuit (OMP) is 
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used for generating sparse representation coefficients for patches. K-means singular value decomposition 
(K-SVD) algorithm is used for training the dictionaries. 

This paper is arranged as follows. Section 2 revisits the related work regarding dictionary learning. 
Section 3 introduces sparse coding and dictionary learning concepts. Section 4 presents mathematical basics of 
dictionary learning. Section 5 discuss the proposed method of SR from dual dictionary learning. Section 6 
depicts experimental evaluation and summarizes results. Conclusion is done in section 7. 


2. RELATED WORK 

Dictionary learning is one of important approach of single-image super-resolution [3]. Dictionary 
learning for SR was introduced by Yang et al. [4] in which two dictionaries were jointly trained, one for LR 
image patches and the other for HR image patches. Zhang et al. [5] developed a computationally efficient 
method by replacing the sparse recovery step by matrix multiplication. He et al. [6] used Bayesian method 
employing a beta process prior for learning the dictionaries which was more consistent between the two 
feature spaces. Bhatia et al. [7] proposed a technique that used coupled dictionary learning by utilizing 
example-based super-resolution for high fidelity reconstruction. Yang et al. [8] presented regularized K-SVD 
for training dictionary and employed regularized orthogonal matching pursuit (ROMP) for sparse 
representation coefficients for patches. Ahmed et al. [9] discussed coupled dictionaries in which group of 
clustered data are designed based on correlation between data patches. By this, recovery of fine details is 
achieved. Dictionary learning methods use large number of image features for learning and also performance 
reduces for complex images. This limitation was overcome by Zhao et al. [10] by utilizing deep learning 
features with dictionary technique. It was difficult to represent different images with a single universal 
dictionary. Hence, Yang et al. [11] introduced the fuzzy clustering and weighted method to overcome this 
limitation. Deeba et al. [12] proposed integrated dictionary learning in which residual image learning is 
combined with K-SVD algorithm. In this, wavelets are used which yields better sparsity and structural details 
about the image. Huang and Dragotti [13] addressed the problem of single image super-resolution by using 
deep dictionary learning architecture. Instead of multilayer dictionaries, L dictionaries are used which are 
divided into synthesis model and the analysis model. High level features are extracted from analysis 
dictionaries and regression function is optimized by the synthesis dictionary. Each method aimed to improve 
the reconstructed super-resolution image to the next level by using different algorithms and through various 
approaches. 


3. SPARSE CODING AND DICTIONARY LEARNING 

Sparse coding is a learning method for obtaining sparse representation of the input. Any signal or an 
image patch can be represented as a linear combination of only few basic elements. Each basic element is 
known as atom. Many numbers of atoms form a dictionary. A high-dimensional signal can be recovered with 
only a few linear measurements with the condition that the signal is sparse. Most of the natural images can be 
represented in sparse representation. If the image is not sparse, the image can be converted into sparse by 
predefined dictionaries like discrete cosine transform (DCT), discrete Fourier transform (DFT), wavelets, 
contourlets, and curvelets. But these dictionaries are suitable only for particular images. Learning the 
dictionary instead of using predefined dictionaries will highly improve the performance [14]. In dictionary 
learning, dictionary is tuned to the input images or signals. 

Different types of dictionary learning algorithms are available, namely method of optimal directions 
(MOD), K-SVD, stochastic gradient descent, Lagrange dual method and least absolute shrinkage and selection 
operator (LASSO). The process of updating the dictionary is simple in MOD. Performance of K-SVD is better 
than MOD but it has higher computational complexity for updating the atoms. Stochastic gradient descent is 
fast compared to MOD and K-SVD. Unlike K-SVD, stochastic gradient descent works well with less number 
of training samples. The advantage of Lagrange dual method is that it has lesser computationally complexity. 
LASSO can solve the l} minimization more efficiently. It minimizes the least square error which yields the 
globally optimal solution. Based on sparsity promoting function, sparse coding methods are classified into 
three types: a) lọ norm method, b) l4 norm method, and c) non-convex sparsity promoting function [15]. 


4. MATHEMATICAL BASICS OF DICTIONARY LEARNING 

Let D € R™** be an overcomplete dictionary of K atoms (K > n). If a signal x € R” is represented as 
a sparse linear combination with respect to D, then x can be treated as x = D Xg where “jE RÝ is a vector 
with very few non-zero elements. Usually, few measurements y are made from x as in (1) [4]: 


y =Lx =LD x (1) 
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where L € R¥*” with k < n is a projection matrix. x is a HR image patch and y is its LR image patch. If D is 
overcomplete, x =D & is underdetermined for unknown coefficients x. Hence y = LD & is more 
underdetermined. It can be proved that the sparsest solution “, to this equation will be unique. Hence, sparse 
representation of a HR image patch x can be recovered from the LR image patch. 

Two coupled dictionaries are utilized. D, is used for LR patches and Dp is used for HR patches. 
Sparse representation of LR patch is obtained from D,. These sparse coefficients are used to recover the 
corresponding HR patch in Dp. For the SR of test image, learnt dictionaries are applied to test image. Sparse 
coefficients of LR image are obtained and are used to select the more suitable patch in the dictionary which 
will be most appropriate for the patches. 


5. PROPOSED METHOD 

The proposed method consists of two stages. First one is dictionary learning stage and second one is 
image synthesis stage. In dictionary learning stage, dual dictionaries are trained. They are main dictionary 
(MD) and residual dictionary (RD). Image super-resolution stage takes input image and performs super 
resolution using the trained model from the previous stage. 


5.1. Dictionary learning stage 

Two dictionaries named as Main dictionary and Residual dictionary are learnt using sparse 
representation [16]. Figure 1 depicts training stage. Initially, a set of training HR images are collected. To 
derive a LR low-frequency image Lp, a HR training image denoted by Hop, is blurred and then 
down-sampled. Bicubic interpolation is done on L,; resulting in HR low-frequency image denoted by H,,. 
By subtracting H,- from Hogg, HR high-frequency image Hyp is generated. Afterwards, MD is constructed 
which is made up of two coupled sub-dictionaries. They are called as low-frequency main dictionary (LMD) 
and high-frequency main dictionary (HMD). Patches are extracted from H,- and Hy, to build the training 
data T = {pk, rt}. Set of patches derived from the HR image Hyp is p. The patches are constructed by first 


extracting patches from images obtained by filtering H, with high-pass filters is pë. 


Tes 


Lur PI Hif Main Dictionary 


Hmmp | Residual Dictionary : 


Figure 1. Process of dictionary learning stage 


Next step is training the dictionary. The set of patches {pr} are trained by the K-SVD algorithm 
resulting in LMD as (2): 


argmin 


LMD, {4° = LMD, {q5 


Yillek — LMD.q*|[; s.tlla“llo < L Yk, (2) 


where {q*}, are sparse representation vectors [5]. Here, assumption is made that patch pk can be recovered 
by approximating pk ~ HMD.q*. Hence, HMD can be obtained by minimizing mean error. 


argmin 


HMD = HMD 


2 
Yellek — HMD. q" ||, (3) 
Let the matrix P, consist of (Prd, and matrix Q consist of {q*}, [5]. Therefore: 


argmin 


HMD = HMD 


ErllPa — HMD. QIl5 (4) 
The solution for (4) is (5). 
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Next, residual dictionary is trained as follows: utilizing the main dictionary and H;p, HR MHF image is 
obtained. It is denoted by Hupp, and using Hyy,r, HR temporary image (Hyp) is obtained which consists of 
more details than H;r and HR RHF image denoted by Hgyr. Thus, residual dictionary is obtained by utilizing 
Hyypand Hgyr. Both MD and RD are combinedly called as dual dictionaries. 


5.2. Image super-resolution stage 

In this stage, an input LR image is converted into estimated high-resolution image as in Figure 2. It 
is assumed that input LR image is developed by HR image by the similar blur and down sampled by the same 
amount which is done in the learning stage. In the first stage, input LR image denoted by Linput is 
interpolated by bicubic method which results in HR low frequency image denoted by Hzr. High-resolution 
MHF image denoted by Hyy, is obtained from H,; and MD. OMP is employed to obtain {pr}, and the 


sparse vectors {q*}, as (6). Also, H;p is filtered with the similar high pass filters used in the learning stage. 
{Bk} = (HMD. q} (6) 


High-resolution patches {Br} are generated by the product of HMD and vectors {q*}, as in (5). Let 


Spg be defined as an operator which extracts a patch from the HR image in location k. The HR MHF image, 
Hyyr is constructed by solving the minimization problem. 


PIME T dl Situne — PXI? (7) 


H = 
MHF H 
MHF 


The above optimization problem can be solved by least square solution, which is given by (8). 


Hunr = (xe StSel* Ue Sk Ph (8) 


Afterwards, the high-resolution temporary image, Hrup is generated by summing H;p with Hyyp. 
Next, by using residual dictionary and Hryp, similar image reconstruction is done resulting in synthesis of 
Heyr. Finally, HR estimated image, Hgsr is generated by adding Hpyp and Hpgyr. Figure 2 depicts the 
complete operation. 


L c 
INPUT HLF Htmp | 
HRHF (+) Hest 
LMD m> H 
MHF LRD 
HMD 
HRD i icti 
Main Dictionary Residual Dictionary 


Figure 2. Process of image super-resolution stage 


6. EXPERIMENTAL RESULTS 

Results of proposed method are discussed in this section. Based on [17], various dictionary sizes are 
tried, and it was observed after trial and error that size of 500 atoms yielded better results. Hence, number of 
atoms in the dictionary in main dictionary learning and residual dictionary learning are set to 500. Number of 
atoms to use in the representation of each image patch is set to 3 [18], [19]. Too large or too small patch size 
tends to yield a smooth or unwanted artifact [20]. Hence image patch size is taken as 9x9 and is overlapped 
by one pixel between adjacent patches. The down-sampling is set to scale factor of two. 5x5 Gaussian filter is 
used for blurring. Convolution function is used to extract features. Experiments are conducted in MATLAB 
R2018a platform. The dictionary is trained by K-SVD dictionary training algorithm. The trained main 
dictionary and residual dictionary files are stored as .mat files. The experiments are carried out on two 
standard data sets, set 5 and set 14. The test images of set 5 are shown in Figure 3. 

The different stages of obtaining super-resolution image from the LR image is depicted in Figure 4 
by taking an example of LR image such as ‘man’ image. The input image of size 512x512 is shown in 


Super resolution image reconstruction via dual dictionary learning ... (Shashi Kiran Seetharamaswamy) 


4974 O ISSN: 2088-8708 


Figure 4(a). HR low frequency image H,p is obtained by interpolating low-resolution image by bicubic 
method which is shown in Figure 4(b). Utilizing the main dictionary and H,;, HR MHF image denoted by 
H mpr is obtained which is as shown in Figure 4(c). HR RHF image denoted by Hgy- is shown in Figure 4(d). 
The final super-resolution image is shown in Figure 4(e). It can be noticed that the SR image has less visual 
artifacts and has sharper results. 


Figure 4. Different stages of obtaining super-resolution image: (a) input image, (b) Hir, (c) Hype, (d) Her, 
and (e) super-resolution image 


Table 1 tabulates peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) 
values for the images of set 5. Table 2 tabulates PSNR and SSIM values for ten images of set 14. Table 3 
tabulates PSNR and SSIM values for ten images of B100 dataset. Results of proposed method are compared 
with state-of-the-art SR algorithms. Table 4 tabulates PSNR values for various methods and proposed 
methods for scale factor x2 on Set 5 and Set14 datasets. Table 5 tabulates SSIM values for different methods 
and proposed methods for scale factor x2 on Set 5 and Set 14 datasets. From Tables 4 and 5, it can be observed 
that the proposed method is superior when compared to other methods in terms of quantitative results. 


Table 1. PSNR and SSIM for images of Set 5 


SI. No. Image PSNR SSIM 
1 Baby 39.5 0.9628 
2 Bird 39.76 0.9645 
3 Butterfly 37.62 0.9588 
4 Head 39.12 0.9614 
5 Woman 37.2 0.9588 


Table 2. PSNR and SSIM for images of Set 14 Table 3. PSNR and SSIM for ten images of B100 dataset 
Sl. No. Image PSNR SSIM Sl.No. Image _PSNR SSIM 


1 Baboon 30.10 0.8612 1 189080 30.17 0.8312 
2 Barbara 32.96 0.9113 2 227092 33.51 0.8640 
3 Coastguard 32.07 0.9023 3 14037 32.42 0.8166 
4 Face 36.90 0.9591 4 45096 33.72 0.8359 
5 Flowers 33.29 0.9143 5 106024 31.10 0.8642 
6 Foreman 35.10 0.9432 6 143090 31.82 0.8590 
7 Lenna 35.61 0.9522 7 241004 30.52 0.8551 
8 Man 33.76 0.9178 8 253055 31.01 0.8945 
9 Monarch 34.21 0.9203 9 260058 31.72 0.8561 
10 Pepper 37.31 0.9561 10 296007 30.96 0.8518 
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Table 4. Benchmark results. Average PSNR for scale factor x2 on Set 5 and Set 14 datasets 


SL. No. Method SetS  Setl4 
1. Bicubic 33.66 30.23 
2: Neighbor embedding with locally linear embedding (NE + LLE) [21] 35.77 31.76 
3. Anchored neighborhood regression (ANR) [22] 35.83 31.80 
4. KK [23] 36.20 32.11 
5. SelfExSR [24] 36.49 32.44 
6. VA+ [25] 36.54 32.28 
T: RFL [26] 36.55 32.36 
8. Super-resolution convolutional neural network (SRCNN) [27] 36.65 32.29 
9. Sparse coding based network (SCN) [28] 36.76 32.48 
10. Very deep super-resolution convolutional networks (VDSR) [29] 37.53 32.97 
11. Deeply-recursive convolutional network (DRCN) [30] 37.63 33.04 
12. Unfolding super resolution network (USRNet) [31] 37.72 33.49 
13. Deep recursive residual network (DRRN) [32] 37.74 33.23 
14. Information distillation network (IDN) [33] 37.83 33.30 
15. MADNet [34] 37.94 33.46 
16. Enhanced deep super-resolution network (EDSR) [35] 38.20 34.02 
17. Residual feature aggregation network (RFANet) [36] 38.26 34.16 
18. Residual dense network (RDN) [37] 38.30 34.10 
19. Proposed dual dictionary learning method 38.64 34.52 


Table 5. Benchmark results. SSIM for scale factor x2 on Set 5 and Set 14 datasets 


SI. No. Method Set5 Setl4 
1. Bicubic 0.9299 0.8688 
2; SRCNN [27] 0.9542 0.9063 
3. SCN [28] 0.9590 0.9123 
4. VDSR [29] 0.9587 0.9124 
5. DRCN [30] 0.9588 0.9118 
6. DRRN [32] 0.9591 0.9136 
7. IDN [33] 0.9600 0.9148 
8. MADNet [34] 0.9604 0.9167 
9; EDSR [35] 0.9602 0.9195 
10. RFANet [36] 0.9615 0.9220 
11. RDN [37] 0.9614 0.9212 
12. Proposed dual dictionary learning method _ 0.9614 0.9213 
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Visual results are evaluated for set 5 images in Figure 5. Figures 5(a) to 5(e) shows the LR images 
of baby, bird, butterfly, head and woman images and Figures 5(f) to 5(j) shows the corresponding HR images 
of baby, bird, butterfly, head and woman images respectively. It can be observed that the proposed method 


results in higher image quality. 


(f) 


Figure 5. Low-resolution and high-resolution images of baby, bird, butterfly, head and woman; (a) LR image 
of baby, (b) LR image of bird, (c) LR image of butterfly, (d) LR image of head, (e) LR image of woman, 


(£) HR image of baby, (g) HR image of bird, (h) HR image of butterfly, (i) HR image of head, 


and (j) HR image of woman 
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7. CONCLUSION 

The paper presented a method for SR based on dual dictionary learning and sparse representation. 
This method can reconstruct lost high frequency details by utilizing main dictionary learning and residual 
dictionary learning. The qualitative results given in the experimental section demonstrate that SR image 
obtained is of higher quality. The improved PSNR of 38.64 for Set 5 dataset and 34.52 for Set 14 dataset as 
compared to other methods also justifies the improvement in quantitative result. 
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